Skip to content

fix: Remove evaluation metric key from schema which failed on some LLMs#105

Merged
jsonbailey merged 11 commits intomainfrom
jb/aic-1897/remove-keys-from-evaluation-structure
Mar 16, 2026
Merged

fix: Remove evaluation metric key from schema which failed on some LLMs#105
jsonbailey merged 11 commits intomainfrom
jb/aic-1897/remove-keys-from-evaluation-structure

Conversation

@jsonbailey
Copy link
Contributor

@jsonbailey jsonbailey commented Mar 11, 2026

fix: Improve metric token collection for Judge evaluations when using LangChain
fix: Include raw response when performing Judge evaluations


Note

Medium Risk
Updates the judge structured-output contract and parsing logic, plus changes LangChain structured invocation to return parsed/raw data and token usage; this could affect downstream integrations expecting the old schema or metrics behavior.

Overview
Judge evaluations now use a fixed structured-output shape. EvaluationSchemaBuilder no longer bakes evaluation_metric_key into the schema; Judge now expects top-level {score, reasoning} and keys the parsed result by the config’s metric key, failing the evaluation when the response doesn’t parse into a valid score/reasoning.

LangChain structured invocations now capture more telemetry and handle Bedrock better. invoke_structured_model uses include_raw=True, returns only the parsed payload, surfaces raw_response, extracts token usage from either usage_metadata or response_metadata, and treats parsing_error as a failed structured call; provider mapping now routes bedrock and bedrock:* to bedrock_converse and injects Bedrock’s foundation provider parameter when needed.

Written by Cursor Bugbot for commit a303c3d. This will update automatically on new commits. Configure here.

@jsonbailey jsonbailey requested a review from a team as a code owner March 11, 2026 22:40
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

@jsonbailey jsonbailey merged commit f951dac into main Mar 16, 2026
35 checks passed
@jsonbailey jsonbailey deleted the jb/aic-1897/remove-keys-from-evaluation-structure branch March 16, 2026 20:46
@github-actions github-actions bot mentioned this pull request Mar 16, 2026
jsonbailey added a commit that referenced this pull request Mar 16, 2026
🤖 I have created a release *beep* *boop*
---


<details><summary>launchdarkly-server-sdk-ai: 0.16.1</summary>

##
[0.16.1](launchdarkly-server-sdk-ai-0.16.0...launchdarkly-server-sdk-ai-0.16.1)
(2026-03-16)


### Bug Fixes

* Improve metric token collection for Judge evaluations when using
LangChain
([f951dac](f951dac))
* Improved raw response when performing Judge evaluations using
LangChain
([f951dac](f951dac))
* Simplify judge structured output for improve reliability for judge
scores for some LLMs
([#105](#105))
([f951dac](f951dac))
</details>

<details><summary>launchdarkly-server-sdk-ai-langchain: 0.3.2</summary>

##
[0.3.2](launchdarkly-server-sdk-ai-langchain-0.3.1...launchdarkly-server-sdk-ai-langchain-0.3.2)
(2026-03-16)


### Bug Fixes

* Improve metric token collection for Judge evaluations when using
LangChain
([f951dac](f951dac))
* Improved raw response when performing Judge evaluations using
LangChain
([f951dac](f951dac))
* Simplify judge structured output for improve reliability for judge
scores for some LLMs
([#105](#105))
([f951dac](f951dac))
* Update comments for setting default
([#99](#99))
([a14761d](a14761d))
</details>

<details><summary>launchdarkly-server-sdk-ai-openai: 0.2.1</summary>

##
[0.2.1](launchdarkly-server-sdk-ai-openai-0.2.0...launchdarkly-server-sdk-ai-openai-0.2.1)
(2026-03-16)


### Bug Fixes

* Update comments for setting default
([#99](#99))
([a14761d](a14761d))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> This is a Release Please version/changelog update only, touching
manifests and package metadata but no functional runtime code. Risk is
low aside from potential release/versioning inconsistencies if any file
was missed.
> 
> **Overview**
> Publishes a new release by bumping versions for
`launchdarkly-server-sdk-ai` to `0.16.1`,
`launchdarkly-server-sdk-ai-langchain` to `0.3.2`, and
`launchdarkly-server-sdk-ai-openai` to `0.2.1` (manifest,
`pyproject.toml`, and `ldai.__version__`).
> 
> Updates changelogs (and `PROVENANCE.md` version snippet) to reflect
the included bug fixes around Judge evaluation output/metrics and
documentation comments.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
380481b. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: jsonbailey <jbailey@launchdarkly.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants